ThreadSanitizer: data race [@ free] vs. [@ posix_memalign] in FFmpegDataDecoder
Categories
(Core :: Audio/Video: Playback, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox80 | --- | fixed |
People
(Reporter: tsmith, Assigned: bryce)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
The attached crash information was detected by ThreadSanitizer while opening twitch.tv and waiting for the page to load on mozilla-central 20200708-34fb169ef962. I am marking as s-s to be safe.
General information about TSan reports
Why fix races?
Data races are undefined behavior and can cause crashes as well as correctness issues. Compiler optimizations can cause racy code to have unpredictable and hard-to-reproduce behavior.
Rating
If you think this race can cause crashes or correctness issues, it would be great to rate the bug appropriately as P1/P2 and/or indicating this in the bug. This makes it a lot easier for us to assess the actual impact that these reports make and if they are helpful to you.
False Positives / Benign Races
Typically, races reported by TSan are not false positives [1], but it is possible that the race is benign. Even in this case it would be nice to come up with a fix if it is easily doable and does not regress performance. Every race that we cannot fix will have to remain on the suppression list and slows down the overall TSan performance. Also note that seemingly benign races can possibly be harmful (also depending on the compiler, optimizations and the architecture) [2][3].
[1] One major exception is the involvement of uninstrumented code from third-party libraries.
[2] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[3] How to miscompile programs with "benign" data races: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Suppressing unfixable races
If the bug cannot be fixed, then a runtime suppression needs to be added in mozglue/build/TsanOptions.cpp
. The suppressions match on the full stack, so it should be picked such that it is unique to this particular race. The bug number of this bug should also be included so we have some documentation on why this suppression was added.
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
[1] One major exception is the involvement of uninstrumented code from third-party libraries.
Can you offer any more guidance around this point? We're dealing with 3rd party code here -- if ffmpeg exists on the system we dynamically link it in.
Throwing ideas around, I see the following possibilities:
- We fix this in ffmpeg upstream assuming this is a problem in ffmpeg.
- We fix our usage of ffmpeg assuming this is a problem in how Gecko interacts with ffmpeg.
- We add a suppression assuming this is a problem in diagnostics due to dynamically linking in 3rd party code.
Assignee | ||
Comment 2•5 years ago
•
|
||
John Lin and I both have this on our radars (feel free to re-assign, John). Setting S2 with room for reduction if this turns out to be something like a diagnostic issue.
Reporter | ||
Comment 3•5 years ago
|
||
Since this is dealing with an instrumented library we will likely need to add it to the ignore list. However I'd like to check with an instrumented build of the library before doing so to verify this is a false positive.
Assignee | ||
Comment 4•5 years ago
|
||
Looking a bit more at this.
We have two threads that could be racing. T40 and T50. T40 is created by T39, and T50 is created by T43.
The T43 -> T50 chain is reproduced below:
Thread T50 (tid=3050, running) created by thread T43 at:
#0 pthread_create /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:967:3 (firefox+0x584eb)
#1 <null> <null> (libavcodec.so.58+0x621203)
#2 mozilla::FFmpegVideoDecoder<58>::Init() src/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp:384:8 (libxul.so+0x3fbdbc9)
#3 operator() src/dom/media/platforms/wrappers/MediaChangeMonitor.cpp:257:21 (libxul.so+0x3f5a20c)
#4 mozilla::detail::ProxyFunctionRunnable<mozilla::MediaChangeMonitor::Init()::$_0, mozilla::MozPromise<mozilla::TrackInfo::TrackType, mozilla::MediaResult, true> >::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1564:29 (libxul.so+0x3f5a20c)
#5 mozilla::TaskQueue::Runner::Run() src/xpcom/threads/TaskQueue.cpp:158:20 (libxul.so+0xb0e2c2)
#6 nsThreadPool::Run() src/xpcom/threads/nsThreadPool.cpp:299:14 (libxul.so+0xb265c6)
#7 nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1234:14 (libxul.so+0xb1ef56)
#8 NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:513:10 (libxul.so+0xb24112)
#9 mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:302:20 (libxul.so+0x13da70e)
#10 RunInternal src/ipc/chromium/src/base/message_loop.cc:334:10 (libxul.so+0x136611c)
#11 RunHandler src/ipc/chromium/src/base/message_loop.cc:327:3 (libxul.so+0x136611c)
#12 MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:309:3 (libxul.so+0x136611c)
#13 nsThread::ThreadFunc(void*) src/xpcom/threads/nsThread.cpp:447:10 (libxul.so+0xb1b005)
#14 _pt_root src/nsprpub/pr/src/pthreads/ptthread.c:201:5 (libnspr4.so+0x44daf)
Previous write of size 8 at 0x7b1000b89a80 by thread T50 (mutexes: write M217997267042709800):
#0 posix_memalign /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:840:3 (firefox+0x57fa1)
#1 av_malloc <null> (libavutil.so.56+0x3df8c)
T43 is initializing a decoder. The MediaChangeMonitor is an object we use to recreate decoders if the media being decoded changes, it sits on top of most of our decoders so it can detect such changes. In this case, it looks like we're just doing first init of a decoder by calling into our ffmpeg decoder's init code [0, 1, 2]. It looks like we have some inlining and we don't have symbols for ffmpeg, so it's not 100% clear what is happening, but it appears some ffmpeg function we call while initializing our video decoder here creates a new thread (T50) and calls av_malloc
which calls posix_memalign
on that thread.
T39 -> T40 chain:
Thread T40 'MediaPD~oder #1' (tid=3028, running) created by thread T39 at:
#0 pthread_create /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:967:3 (firefox+0x584eb)
#1 _PR_CreateThread src/nsprpub/pr/src/pthreads/ptthread.c:458:14 (libnspr4.so+0x3bfb3)
#2 PR_CreateThread src/nsprpub/pr/src/pthreads/ptthread.c:533:12 (libnspr4.so+0x308e2)
#3 nsThread::Init(nsTSubstring<char> const&) src/xpcom/threads/nsThread.cpp:659:8 (libxul.so+0xb1c5ce)
#4 nsThreadManager::NewNamedThread(nsTSubstring<char> const&, unsigned int, nsIThread**) src/xpcom/threads/nsThreadManager.cpp:623:12 (libxul.so+0xb2380a)
#5 NS_NewNamedThread(nsTSubstring<char> const&, nsIThread**, already_AddRefed<nsIRunnable>, unsigned int) src/xpcom/threads/nsThreadUtils.cpp:161:57 (libxul.so+0xb29c98)
#6 NS_NewNamedThread src/xpcom/threads/nsThreadUtils.cpp:152:10 (libxul.so+0xb25ad6)
#7 nsThreadPool::PutEvent(already_AddRefed<nsIRunnable>, unsigned int) src/xpcom/threads/nsThreadPool.cpp:115:17 (libxul.so+0xb25ad6)
#8 nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) src/xpcom/threads/nsThreadPool.cpp:350:5 (libxul.so+0xb26ebd)
#9 non-virtual thunk to nsThreadPool::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) src/xpcom/threads/nsThreadPool.cpp (libxul.so+0xb273a8)
#10 mozilla::SharedThreadPool::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) /builds/worker/workspace/obj-build/dist/include/mozilla/SharedThreadPool.h:70:42 (libxul.so+0xb076d0)
#11 mozilla::TaskQueue::DispatchLocked(nsCOMPtr<nsIRunnable>&, unsigned int, mozilla::AbstractThread::DispatchReason) src/xpcom/threads/TaskQueue.cpp:65:26 (libxul.so+0xb0d6b1)
#12 mozilla::TaskQueue::Dispatch(already_AddRefed<nsIRunnable>, unsigned int) /builds/worker/workspace/obj-build/dist/include/mozilla/TaskQueue.h:71:14 (libxul.so+0xb2eaf8)
#13 InvokeAsync<(lambda at src/dom/media/MediaFormatReader.cpp:723:22)> /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1606:12 (libxul.so+0x3bf1e3b)
#14 InvokeAsync<(lambda at src/dom/media/MediaFormatReader.cpp:723:22)> /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1620:10 (libxul.so+0x3bf1e3b)
#15 mozilla::MediaFormatReader::DemuxerProxy::Init() src/dom/media/MediaFormatReader.cpp:722:10 (libxul.so+0x3bf1e3b)
#16 mozilla::MediaFormatReader::AsyncReadMetadata() src/dom/media/MediaFormatReader.cpp:1076:13 (libxul.so+0x3bf8a65)
#17 applyImpl<mozilla::MediaFormatReader, RefPtr<mozilla::MozPromise<mozilla::MetadataHolder, mozilla::MediaResult, true> > (mozilla::MediaFormatReader::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1188:12 (libxul.so+0x3d350fb)
#18 apply<mozilla::MediaFormatReader, RefPtr<mozilla::MozPromise<mozilla::MetadataHolder, mozilla::MediaResult, true> > (mozilla::MediaFormatReader::*)()> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1194:12 (libxul.so+0x3d350fb)
#19 Invoke /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1437:47 (libxul.so+0x3d350fb)
#20 mozilla::detail::ProxyRunnable<mozilla::MozPromise<mozilla::MetadataHolder, mozilla::MediaResult, true>, RefPtr<mozilla::MozPromise<mozilla::MetadataHolder, mozilla::MediaResult, true> > (mozilla::MediaFormatReader::*)(), mozilla::MediaFormatReader>::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1457:42 (libxul.so+0x3d350fb)
#21 mozilla::AutoTaskDispatcher::TaskGroupRunnable::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/TaskDispatcher.h:228:35 (libxul.so+0xb0970b)
#22 mozilla::TaskQueue::Runner::Run() src/xpcom/threads/TaskQueue.cpp:158:20 (libxul.so+0xb0e2c2)
#23 nsThreadPool::Run() src/xpcom/threads/nsThreadPool.cpp:299:14 (libxul.so+0xb265c6)
#24 nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1234:14 (libxul.so+0xb1ef56)
#25 NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:513:10 (libxul.so+0xb24112)
#26 mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:332:5 (libxul.so+0x13da778)
#27 RunInternal src/ipc/chromium/src/base/message_loop.cc:334:10 (libxul.so+0x136611c)
#28 RunHandler src/ipc/chromium/src/base/message_loop.cc:327:3 (libxul.so+0x136611c)
#29 MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:309:3 (libxul.so+0x136611c)
#30 nsThread::ThreadFunc(void*) src/xpcom/threads/nsThread.cpp:447:10 (libxul.so+0xb1b005)
#31 _pt_root src/nsprpub/pr/src/pthreads/ptthread.c:201:5 (libnspr4.so+0x44daf)
Write of size 8 at 0x7b1000b89a80 by thread T40:
#0 free /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/tsan/rtl/tsan_interceptors.cc:706:3 (firefox+0x57258)
#1 av_buffer_unref <null> (libavutil.so.56+0x1c500)
#2 mozilla::FFmpegDataDecoder<58>::DoDecode(mozilla::MediaRawData*, bool*, nsTArray<RefPtr<mozilla::MediaData> >&) src/dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp:184:10 (libxul.so+0x3fbaf7a)
#3 mozilla::FFmpegDataDecoder<58>::ProcessDecode(mozilla::MediaRawData*) src/dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp:140:20 (libxul.so+0x3fbaac2)
#4 applyImpl<mozilla::FFmpegDataDecoder<LIBAV_VER>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true> > (mozilla::FFmpegDataDecoder<LIBAV_VER>::*)(mozilla::MediaRawData *), StoreRefPtrPassByPtr<mozilla::MediaRawData> , 0> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1188:12 (libxul.so+0x3fc180e)
#5 apply<mozilla::FFmpegDataDecoder<LIBAV_VER>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true> > (mozilla::FFmpegDataDecoder<LIBAV_VER>::*)(mozilla::MediaRawData *)> /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:1194:12 (libxul.so+0x3fc180e)
#6 Invoke /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1437:47 (libxul.so+0x3fc180e)
#7 mozilla::detail::ProxyRunnable<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true>, RefPtr<mozilla::MozPromise<nsTArray<RefPtr<mozilla::MediaData> >, mozilla::MediaResult, true> > (mozilla::FFmpegDataDecoder<58>::*)(mozilla::MediaRawData*), mozilla::FFmpegDataDecoder<58>, mozilla::MediaRawData*>::Run() /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:1457:42 (libxul.so+0x3fc180e)
#8 mozilla::TaskQueue::Runner::Run() src/xpcom/threads/TaskQueue.cpp:158:20 (libxul.so+0xb0e2c2)
#9 nsThreadPool::Run() src/xpcom/threads/nsThreadPool.cpp:299:14 (libxul.so+0xb265c6)
#10 nsThread::ProcessNextEvent(bool, bool*) src/xpcom/threads/nsThread.cpp:1234:14 (libxul.so+0xb1ef56)
#11 NS_ProcessNextEvent(nsIThread*, bool) src/xpcom/threads/nsThreadUtils.cpp:513:10 (libxul.so+0xb24112)
#12 mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) src/ipc/glue/MessagePump.cpp:302:20 (libxul.so+0x13da70e)
#13 RunInternal src/ipc/chromium/src/base/message_loop.cc:334:10 (libxul.so+0x136611c)
#14 RunHandler src/ipc/chromium/src/base/message_loop.cc:327:3 (libxul.so+0x136611c)
#15 MessageLoop::Run() src/ipc/chromium/src/base/message_loop.cc:309:3 (libxul.so+0x136611c)
#16 nsThread::ThreadFunc(void*) src/xpcom/threads/nsThread.cpp:447:10 (libxul.so+0xb1b005)
#17 _pt_root src/nsprpub/pr/src/pthreads/ptthread.c:201:5 (libnspr4.so+0x44daf)
T39 involves us setting up a MediaFormatReader, this is an object which grabs data from a demuxer and feeds it to decoders. Somewhere in there it's creating T40 which is doing some decoding[4]. This thread calls into ffmpeg to do the decode, which calls av_buffer_unref
somewhere, and touches the memory from the earlier threads.
I'm not super familiar with the TSAN reporting scheme. I'm reading the report as saying that during the first write a mutex is held -- (mutexes: write M217997267042709800)
. However, that mutex has already been destroyed by the second write.
I beleive ffmpeg does its own internal locking. There used to be some ability to interact with this, but it's been deprecated[5] and the expectation is that if you build with thread support ffmpeg should provide some amount of safety, though it's not clear to me the extent of this safety. It seems possible that ffmpeg is internally aware of these structures and is handling safety. E.g. ffmpeg documentation indicates that 'Referencing and unreferencing the buffers is thread-safe and thus may be done from multiple threads simultaneously without any need for additional locking.' This doesn't mean for sure this is safe, but I think is useful context.
[0] https://searchfox.org/mozilla-central/rev/d6d8fcc22c3820f2ae08229e0d37be19fba74db9/dom/media/platforms/wrappers/MediaChangeMonitor.cpp#257
[1] https://searchfox.org/mozilla-central/rev/d6d8fcc22c3820f2ae08229e0d37be19fba74db9/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp#384
[2] https://searchfox.org/mozilla-central/rev/d6d8fcc22c3820f2ae08229e0d37be19fba74db9/dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp#70
[3] https://searchfox.org/mozilla-central/rev/d6d8fcc22c3820f2ae08229e0d37be19fba74db9/dom/media/MediaFormatReader.cpp#722
[4] https://searchfox.org/mozilla-central/rev/d6d8fcc22c3820f2ae08229e0d37be19fba74db9/dom/media/platforms/ffmpeg/FFmpegDataDecoder.cpp#147
[5] https://github.com/FFmpeg/FFmpeg/blob/a54b367c781f7735c321e6ac08a5deebeb9796a9/doc/APIchanges#L313
[6] https://ffmpeg.org/doxygen/trunk/group__lavu__buffer.html
Reporter | ||
Comment 5•5 years ago
|
||
I verified this is a false positive with TSan instrumented libraries.
Reporter | ||
Comment 6•5 years ago
|
||
Comment 8•5 years ago
|
||
bugherder |
Description
•